IBM statistical machine translation for spoken languages
نویسنده
چکیده
We discuss performance enhancing techniques we have developed for the IWSLT 2005 Evaluation Campaign: (i) a phrase acquisition technique which expands the phrase boundaries to include target words aligned to null source words in a principled manner, and (ii) a system combination technique which selects the minimum cost translation output out of many translation outputs of the same input segment produced by various systems using different phrase translation lexicons. We also discuss IBM system performances in the Arabic to English and Chinese to English translation evaluations of the IWSLT 2005 evaluation campaign.
منابع مشابه
IJCNLP 2008 Sixth SIGHAN Workshop on Chinese Language Processing
In this paper, we propose an example-based decoder for a statistical machine translation (SMT) system, which is used for spoken language machine translation. In this way, it will help to solve the re-ordering problem and other problems for spoken language MT, such as lots of omissions, idioms etc. Through experiments, we show that this approach obtains improvements over the baseline on a Chines...
متن کاملIBM spoken language translation system evaluation
We discuss phrase-based statistical machine translation performance enhancing techniques which have proven effective for Japanese-to-English and Chinese-toEnglish translation of BTEC corpus. We also address some issues that arise in conversational speech translation quality evaluations.
متن کاملGrammar Inference and Statistical Machine Translation
NLP researchers face a dilemma: on one side, it is unarguably accepted that languages have internal structure rather than strings of words. On the other side, they nd it very di cult and expensive to write grammars that have good coverage of language structures. Statistical machine translation tries to cope with this problem by ignoring language structures and using a statistical models to depi...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005